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Abstract 


In  this  report,  we  present  research  toward  a  vision-based  landing  system  for  unmanned 
rotorcraft  in  unknown  terrain  that  is  centered  around  our  Recursive  Multi-Frame  Planar 
Parallax  algorithm  [1]  for  high-accuracy  terrain  mapping.  We  give  an  in-depth  description 
of  the  vision  system,  an  overview  of  our  experimental  platforms,  and  both  synthetic  and 
experimental  terrain  mapping  results. 
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Chapter  1 


Introduction 


The  research  in  this  report  was  born  from  a  desire  to  reconstruct  terrain  from  aerial  im¬ 
agery  online  in  real  time,  at  high  altitudes,  at  an  extreme  level  of  accuracy;  to  integrate  this 
terrain  information  into  a  global  map  that  an  intelligent  UAV  could  use  to  make  informed 
decisions;  and  to  be  able  to  efficiently  analyze  this  map  as  it  is  created  to  find  safe  land¬ 
ing  locations  for  an  unmanned  rotorcraft.  Dissatisfied  with  the  usual  sensing  paradigms, 
namely  laser  scanners  (which  are  easily  detected  and  which  require  high-powered  beams 
that  are  energy-  and  weight-intensive  for  operation  at  high  altitudes)  and  stereo  cameras 
(which  are  highly  inaccurate  at  high  altitudes  using  baselines  that  can  be  accommodated  on 
a  vehicle),  we  conceived  a  novel  high-accuracy  real-time  visual  reconstruction  paradigm 
using  a  single  moving  camera,  namely  the  Recursive  Multi-Frame  Planar  Parallax  algo¬ 
rithm  [1], 

Experimental  validation  for  the  mapping  and  landing  scheme  was  performed  on  a  full- 
sized  helicopter  with  the  cooperation  of  Boeing  Phantom  Works,  as  well  as  on  a  smaller 
rotorcraft  at  UC  Berkeley  (see  Figure  1.1).  Although  inaccuracy  in  current  real-time  camera 
localization  algorithms  ultimately  prohibited  us  from  performing  the  high-accuracy  map¬ 
ping  task  online  in  real  time,  we  were  able  to  demonstrate  accurate  real-time  performance 
for  all  other  parts  of  the  system  by  performing  the  camera  localization  offline.  We  believe 
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Figure  1.1:  The  Berkeley  UAV  testbed:  an  electrically -powered,  MPC-controlled  rotorcraft 
for  vision-based  landing  and  terrain  mapping.  Courtesy  of  David  Shim. 

that  accurate  real-time  camera  localization  is  required  for  any  system  that  wishes  to  inte¬ 
grate  high-altitude  terrain  measurements  over  time  and  motion  into  a  single  global  map, 
and  we  are  confident  that  further  work  in  this  area  will  produce  algorithms  that  are  up  to 
the  challenge. 

The  vision  landing  problem  has  been  addressed  in  many  previous  research  projects, 
although  many,  including  [3],  [4],  [5],  [6],  and  [7],  require  an  easily-recognizable  landing 
target.  No  landing  target  is  used  in  [8],  although  it  is  assumed  that  the  visual  axis  of  the 
camera  is  perpendicular  to  the  ground  and  that  the  image  contrast  is  higher  at  the  boundary 
of  obstacles  than  anywhere  else.  The  approach  most  similar  to  ours  is  that  of  A.  Johnson  et 
al.  at  JPL  [9],  although  their  use  of  only  two  images  at  a  time  (a  wide -baseline  stereo  pair 
over  time  from  a  single  camera)  restricts  their  3D  reconstruction  accuracy  at  high  altitudes. 

The  remainder  of  this  report  is  organized  as  follows:  In  Chapter  2,  we  give  an  overview 
of  the  vision  system  for  autonomous  rotorcraft  landing  and  mapping.  In  Chapter  3,  we 
give  a  more  in-depth  treatment  of  the  Recursive  Multi-Frame  Planar  Parallax  algorithm.  In 
Chapters  4  and  5,  we  discuss  two  real-time  camera  localization  techniques  that  were  inves- 
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tigated  during  the  course  of  the  project,  giving  an  analysis  of  their  accuracy.  In  Chapter  6 
we  discuss  the  vehicle  platforms  on  which  the  system  was  tested,  and  in  Chapter  7  we  give 
synthetic  and  experimental  results.  Finally,  in  Chapter  8  we  conclude. 
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Chapter  2 


Vision  System  Overview 


Much  of  the  content  of  this  chapter  is  based  on  unpublished  work  coauthored  with  David 
Shim,  Christopher  Geyer,  and  Shankar  Sastry  [2],  This  material  is  used  with  the  permission 
of  the  coauthors. 

2.1  Introduction 

The  computer  vision  problem  that  we  address  in  this  project  is  one  of  3D  terrain  reconstruc¬ 
tion  and  analysis.  In  particular,  we  are  trying  to  find  suitable  landing  locations,  i.e.  regions 
that  are  large  enough  to  safely  land  the  helicopter  that  are  flat  and  free  of  debris,  have  a 
slope  of  no  more  than  4  degrees,  have  been  confidently  mapped,  (possibly)  are  similar  to  a 
desired  constant  appearance,  and  (optionally)  contain  a  distinctive  landing  target1.  Despite 
the  availability  of  high-accuracy  active  technologies  such  as  radar  and  LIDAR,  we  use  a 
camera  for  this  task  because  it  is  passive  (and  hence  difficult  to  detect),  and  because  such 
active  technologies  require  high-powered  beams  that  are  energy-  and  weight-intensive  for 
operation  at  high  altitude. 

The  vision  system  (see  Figure  2.1)  consists  of  a  feature  tracking  thread,  which  tracks 

'The  landing  target  is  only  used  to  enable  the  operator  to  choose  a  specifi  c  location  if  many  locations  are 
suitable — the  vision  system  always  ensures  that  all  other  requirements  are  satisfi  ed. 
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Appearance  Appearance 


Figure  2.1:  Vision  system  architecture.  Note  that  the  target  detector  is  optional  and  does 
not  replace  3D  terrain  reconstruction  using  the  Recursive  Multi-Frame  Planar  Parallax 
(RMFPP)  algorithm. 


distinctive  image  points  through  the  image  sequence  and  stores  them  in  the  feature  repos¬ 
itory;  a  motion  stamping  thread,  which  uses  GPS/INS  data  and  feature  tracks  to  estimate 
the  global  position  and  orientation  of  the  camera  when  each  image  was  captured  and  to 
estimate  the  3D  locations  of  the  tracked  features  (the  latter  of  which  are  used  to  choose 
the  best  reference  plane  for  the  Recursive  Multi-Frame  Planar  Parallax  algorithm);  and  the 
mapping  thread,  which  adds  3D  points  to  its  modular  elevation  and  appearance  map  using 
the  Recursive  Multi-Frame  Planar  Parallax  algorithm.  The  vision  system  also  includes  two 
interchangeable  sets  of  external  interfaces:  in  flight  mode,  it  uses  a  custom  Firewire  capture 
thread,  which  stores  timestamped  captured  images  in  a  frame  repository,  and  an  external 
communication  thread,  which  receives  GPS/INS  and  other  vehicle  state  data  from,  and 
sends  desired  trajectory  information  to,  the  vehicle  control  computer;  in  simulation/replay 
mode  the  Firewire  capture  thread  is  replaced  by  a  custom  simulation/replay  thread,  and 
all  communication  through  the  external  communication  thread  is  redirected  to  the  simula¬ 
tion/replay  thread. 
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2.2  The  Recursive  Multi-Frame  Planar  Parallax 


Algorithm 

The  cornerstone  of  our  approach  is  our  novel  Recursive2  Multi-Frame  Planar  Parallax 
(RMFPP)  algorithm  [1],  The  RMFPP  algorithm  is  a  direct3  method  for  obtaining  dense4 
structure  (terrain,  in  our  case)  estimates  with  corresponding  appearance  online  in  real  time 
by  using  a  single  moving  camera  whose  motion  has  been  accurately  estimated.  We  choose 
to  use  this  single-camera  method  because  of  the  inaccuracy  inherent  in  estimating  distant 
terrain  using  a  stereo  camera  pair  with  a  baseline  that  is  attainable  on  the  vehicle  (see  dis¬ 
cussion  in  Chapter  3),  while  using  multiple  images  as  the  camera  moves  through  space 
allows  the  RMFPP  algorithm  to  attain  expected  range  error  that  increases  between  linearly 
and  with  the  square  root  of  the  range. 

Suppose  a  camera  takes  images  i  =  1,  —  ,  m  of  a  rigid  scene,  where  image  1  is  the 
reference  view  in  which  range  will  be  estimated  for  each  pixel.  Then  the  homographies  Hj 
that  transfer  the  i-th  view  to  the  reference  view  via  a  chosen  reference  plane  are  given  by: 

1  V1 

-- TiNT  K_1  e  R3x3,  (2.1) 

a  ) 

where  ( N  6  E3,  d  e  E3)  are  the  unit  normal  of  the  reference  plane  in  the  coordinate 
system  of  the  first  camera  and  the  perpendicular  distance  of  the  first  viewpoint  from  the 
reference  plane,  (Ri  e  £0(3) ,  T*  e  E3)  are  the  rotation  and  translation  from  first  camera 
coordinate  system  to  the  i-th  one,  and  K  e  SL( 3)  is  the  constant  intrinsic  calibration  matrix 
of  the  camera. 

Suppose  that  X  e  K3  is  a  point  in  space  in  the  coordinate  system  of  the  first  camera. 

2The  cost  of  incorporating  measurements  from  a  new  image  depends  only  on  the  number  of  pixels  in  the 
image  and  does  not  depend  on  the  number  of  images  already  seen. 

3The  algorithm  expresses  a  cost  function  directly  in  terms  of  the  image  rather  than  depending  on  fea¬ 
ture  matching,  and  gradients  of  the  cost  function  are  calculated  by  linearization  of  the  brightness  constancy 
constraint  (see  pro:  [10],  con:  [11]). 

4The  algorithm  provides  a  depth  estimate  for  every  pixel  that  is  within  a  suffi  ciently  textured  region. 
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Let  pi  =  ( Xi,y.i )  for  i  —  1, m  be  the  projection  of  X  into  each  image,  tt (x,y,z)  = 
(x/z,  y/z ),  and  n*{x,  y )  =  (x,  y,  1).  The  quantity 

Pi  -  vr  (  H|  n*(pi ))  (2.2) 

' - v - ' 

Pi' 

is  called  planar  parallax,  and  is  zero  if  X  lies  on  the  reference  plane.  The  RMFPP  algorithm 
uses  planar  parallax,  which  is  small  for  small  movements  if  X  is  close  to  the  reference 
plane  and  increases  with  increased  camera  motion,  to  recursively  estimate  the  quantity 
7  =  h/ z  for  each  pixel  pi  in  the  reference  image,  where  z  is  the  range  of  X  in  the  first 
view  and  h  =  N  r X  +  d  is  the  signed  perpendicular  distance  of  X  from  the  reference 
plane.  We  then  recover  the  range  z  using  z  =  —d/(N1  K_1 i r*(  pi )  —  7). 

The  RMFPP  algorithm  will  be  discussed  in  more  detail  in  Chapter  3. 

2.3  Motion  Stamping 

In  order  to  use  the  RMFPP  algorithm,  we  must  first  motion  stamp  each  image,  i.e.  de¬ 
termine  the  orientation  and  position  of  the  camera  when  each  image  is  captured  using  a 
combination  of  (noisy)  vehicle  state  data  and  (noisy)  image  feature  tracks.  We  will  discuss 
a  probabilistic  approach  to  this  problem  in  Chapter  4  and  an  analytic  approach  in  Chapter 
5,  although  we  will  ultimately  conclude  that  the  accuracy  of  these  online  algorithms  has 
thus  far  proven  to  be  insufficient  for  our  needs.  While  we  continue  to  experiment  with 
alternative  online  motion-stamping  methods,  for  the  purposes  of  the  experimental  results 
in  this  report  we  use  an  offline  motion  stamping  method  to  showcase  the  performance  of 
the  other  components:  we  perform  SIFT  [12]  feature  tracking  followed  by  Sparse  Bundle 
Adjustment  (SBA)  [13],  where  we  initialize  the  camera  orientations  and  positions  using 
the  previous  GPS/INS  datapoints  corrected  by  a  constant  coordinate  transformation  (see 
Section  6.4).  Using  this  offline  method  requires  the  separation  of  the  experiment  into  three 
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pieces,  but  the  vision  high-level  planner  still  directs  the  helicopter  flight  while  it  collects 
image  data  and  vehicle  state  data,  the  RMFPP  algorithm  is  still  run  in  better  than  real  time 
on  hardware  similar  to  that  on  the  helicopter  vision  computer,  and  we  successfully  execute 
the  closer  inspection  and  landing  maneuvers  using  a  premade  elevation  and  appearance 
map. 


2.4  Modular  Elevation  and  Appearance  Map,  and  Land¬ 
ing  Site  Quality 

After  it  is  filtered  for  outliers  (see  discussion  in  Chapter  3),  the  list  of  3D  terrain  and 
appearance  points  produced  by  the  RMFPP  algorithm  is  stored  in  a  modular  elevation  and 
appearance  map.  The  map  is  represented  as  a  2D  (x,  y)  grid  with  three  layers  at  each  of 
multiple  resolutions:  terrain  elevation  z,  terrain  elevation  variance  (expected  squared  error) 
^7,  and  appearance  a.  The  grid  is  modular  in  the  sense  that  it  is  broken  into  fixed-sized 
rectangular  blocks  of  real  2D  space  and  that  only  blocks  in  which  points  have  been  observed 
are  present  in  the  map.  All  blocks  contain  all  resolutions,  and  higher  resolutions  in  each 
block  contain  more  pixels  in  each  layer  than  lower  resolutions;  for  operations  that  require 
only  a  single  resolution,  such  as  calculating  landing  quality  and  exporting  maps,  each  map 
block  independently  chooses  its  highest  resolution  where  at  least  a  fixed  percentage  of 
pixels  have  known  value  (or  its  lowest  resolution  if  none  of  its  resolutions  have  enough 
pixels  with  known  value). 

The  modular  elevation  and  appearance  map  is  designed  to  be  efficient  and  robust.  To 
constrain  memory  usage,  only  a  fixed  maximum  number  of  blocks  can  be  present;  the 
least  recently  accessed  block  is  recycled  when  a  new  block  is  needed  and  no  more  blocks 
are  available.  Because  the  scene  is  likely  to  change  over  long  periods  of  time,  blocks  are 
reinitialized  when  they  are  revisited  after  no  updates  for  a  fixed  period  of  time.  To  reduce 
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the  update  and  creation  of  blocks  due  to  outliers,  a  fixed  number  of  points  from  a  given 
RMFPP  update  must  be  contained  in  an  existing  or  potential  block  for  it  to  be  updated  or 
created.  To  reduce  the  number  of  landing  candidates  generated  by  a  suitable  region,  only 
block  centers  are  considered  as  possible  landing  sites.  To  eliminate  the  trade-off  between 
requiring  large  landing  sites  and  having  many  mostly-unexplored  blocks,  and  to  allow  more 
dense  possible  landing  sites,  the  landing  quality  score  of  a  given  block  is  calculated  over 
itself  and  a  given  radius  of  its  neighbor  blocks. 

To  exploit  the  expectation  that  a  batch  of  3D  points  from  the  RMFPP  algorithm  is  from 
a  contiguous  area,  blocks  are  bidirectionally  liked  to  their  immediate  neighbors  as  well  as 
being  maintained  in  a  hash  table  over  their  (x,  y )  centers,  and  all  existing  blocks  that  are 
required  for  the  given  list  of  3D  terrain  and  appearance  points  produced  by  the  RMFPP 
algorithm  are  retrieved  immediately  upon  determination  of  the  points’  2D  bounding  box. 
Adding  each  3D  point  to  the  map  involves  creating  or  locating  the  proper  map  block  in  the 
prefetched  grid  and  then  updating  the  closest  2D  map  block  pixel  at  each  resolution,  i.e. 
optimally  updating  the  pixel  at  each  layer  based  on  the  existing  elevation  variance  at  the 
pixel  and  the  elevation  variance  for  the  new  3D  point  as  provided  by  the  RMFPP  algorithm. 

For  maximum  efficiency,  landing  quality  scores  for  each  map  block  are  calculated 
within  the  map  module  and  the  time  required  to  update  these  scores  for  a  batch  of  3D 
points  from  the  RMFPP  algorithm  is  linear  in  the  number  of  points  and  in  the  size  of  the 
points’  2D  bounding  box.  Let  S  be  the  set  of  2D  grid  points  with  known  values  in  a  given 
map  block.  The  following  statistics  are  maintained  at  each  map  block  at  each  resolution: 
the  number  of  2D  grid  points  whose  values  are  known  and  the  total  number  of  2D  points  in 
the  block,  from  which  the  fraction  of  unknown  points  can  be  computed;  a  count  of  landing 
target  detections  and  a  sum  over  target  qualities  as  given  by  the  target  detector  (optional, 
not  used  to  obtain  the  experimental  results  in  this  report),  from  which  the  average  target 
quality  can  be  computed;  J ^ieswix *,  Y.i&swiV^  ^2ies  ^2ieS  Wixixi,  ^2ieS  WiXiVu 
'£lieswixizi,  V,:  s  «•,//,//„  J2ieswiyizi’  J2i&swizizi’  and  from  which  the  best- 
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fit  plane  and  plane-fit  error  can  be  computed;  and  JA  s  a>  anc'  a*a*’  from  which  the 
average  appearance  and  appearance  variance  can  be  computed.  When  a  2D  pixel  is  updated 
in  a  given  block  and  resolution,  its  previous  value  is  subtracted  from,  and  its  new  value  is 
added  to,  each  statistic  that  is  maintained  for  that  block  and  resolution. 

To  calculate  the  landing  quality  score  for  all  modified  blocks  after  an  RMFPP  map 
update,  an  integral  image  [14]  of  the  maintained  statistics  is  formed  over  a  rectangular 
area  suitably  larger  than  the  modified  area  of  the  map,  and  the  statistics  over  a  block  and 
its  given  radius  of  neighbors  are  calculated  in  constant  time5  by  adding  and  subtracting 
appropriate  elements  of  the  integral  image.  The  landing  quality  score  for  each  modified 
block  (combined  with  its  radius  of  neighbors)  is  a  linear  combination  of  the  angle  of  the 
best-fit  plane  from  horizontal,  the  plane  fit  error,  the  percentage  of  grid  squares  in  the  block 
with  unknown  value,  the  difference  between  the  average  appearance  and  a  given  desired 
appearance,  the  appearance  variance,  and  the  average  target  quality  (optional).  A  block 
is  considered  to  be  a  landing  candidate  if  it  is  below  given  thresholds  on  each  element  of 
the  landing  quality  equation,  and  landing  candidates  are  maintained  in  a  priority  queue  so 
that,  all  other  things  being  equal,  the  better  (lower  ‘landing  quality’  value)  candidates  are 
considered  first. 


2.5  Target  Detector 

In  case  multiple  safe  landing  sites  are  available,  we  want  the  vision  system  to  be  able  to 
recognize  a  distinctive  target  on  the  ground  (see  Figure  2.2).  The  target  detector  operates 
directly  on  each  RMFPP  reference  image,  and  the  3D  output  of  the  RMFPP  algorithm 
for  that  image  is  used  to  determine  the  3D  location  of  any  detected  landing  target.  As 
discussed  in  Section  2.4,  target  detection  counts  are  used  as  one  of  one  of  several  features 
for  determining  the  quality  of  a  landing  site.  Hence,  target  detection  does  not  replace  3D 

5The  time  required  by  this  operation  is  independent  of  the  block  radius  and  the  radius  of  the  desired 
landing  site. 
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Figure  2.2:  View  of  the  landing  target  from  the  air. 


terrain  reconstruction  using  the  RMFPP  algorithm. 

The  target  detector  first  detects  corners  in  the  image  as  local  maxima  of  a  filter  com¬ 
posed  of  separable  kernels,  whose  span  resembles  typical  comers.  It  then  determines 
the  homography  h  e  H,  where  HI  is  a  discrete  set  of  homographies,  that  best  trans¬ 
forms  the  n  points  into  points  with  integer  coordinates  according  to  the  function  f(h)  = 
£”=i  (exp^df)'1  for  a  given  a,  where  d%  is  the  distance  of  the  i-th  transformed  point 
from  the  closest  integer  coordinates.  The  best  homography  must  meet  or  exceed  a  given 
function  value  threshold  r  to  be  considered  a  positive  detection,  so  many  of  the  homo¬ 
graphies  can  be  eliminated  without  computing  the  entire  sum  when  the  number  of  points 
remaining  to  incorporate  into  the  sum  is  less  than  the  difference  between  the  threshold  and 
the  current  value  of  the  sum,  or  when  the  number  of  points  remaining  to  incorporate  into 
the  sum  is  less  than  the  difference  between  the  best  final  function  value  so  far  and  the  cur¬ 
rent  value  of  the  sum  (since  each  point  can  contribute  at  most  1  to  the  sum).  Finally,  if  a 
suitable  best  homography  h  has  been  found,  the  target  detector  determines  the  subset  of  the 
n  points  that  have  transformed  coordinates  within  a  given  distance  q  of  integer  coordinates; 
if  none  of  them  are  within  this  distance,  the  detector  determines  that  no  target  is  present, 
otherwise  it  determines  that  a  target  is  present  at  the  median  x  and  y  image  coordinates  of 
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Figure  2.3:  An  experimental  ROC  curve  of  the  landing  target  detector  for  different  values 
of  r. 


the  n  points,  with  quality  1  —  exp  (f(h)  —  q)2  for  a  given  w  (a  lower  ‘quality’  value 
means  a  more  confident  detection). 

The  target  detector  was  tested  on  images  captured  by  both  the  Boeing  and  Berkeley  ve¬ 
hicles  at  multiple  locations,  with  synthetic  landing  targets  at  random  scales  and  orientations 
inserted  at  random  image  locations.  Figure  2.3  shows  the  ROC  curve  for  different  values 
of  r,  which  gives  a  result  of  80%  true  detections  and  1%  false  detections  on  the  test  data. 


2.6  High-level  Planner 

Concurrently  with  the  above  vision  algorithms,  the  vision  system  executes  a  high-level 
planner  that  operates  directly  on  the  vehicle  state  data.  The  default  plan  (when  no  landing 
site  candidates  have  been  identified  within  a  given  maximum  radius  of  the  current  location) 
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Box  Search 
(Top  View) 


(Top  View) 


▼ 

Final  Descent 
(Side  View) 


Figure  2.4:  Plans  for  the  high-level  planner. 


is  an  outwardly-expanding  box  search  centered  around  the  point,  and  at  the  altitude  of, 
where  the  planner  is  initially  enabled  (see  Figure  2.4).  When  a  landing  site  candidate  is 
identified  that  is  within  the  given  maximum  radius,  the  planner  enters  a  mode  where  it 
directs  a  descending  spiral  toward  a  point  a  fixed  distance  directly  over  the  candidate  site. 
The  candidate  site  is  examined  whenever  it  is  visible  during  the  downward  spiral,  and  all 
other  visible  locations  are  also  examined  at  closer  range  during  this  process.  At  any  time 
during  the  spiral,  the  vision  system  may  determine  that  the  site  is  unsuitable,  or  the  human 
operator  may  signal  that  the  site  is  unsuitable,  and  the  planner  will  switch  to  a  different 
candidate  site  (or  return  to  the  default  box  search  plan  if  there  are  no  nearby  candidate 
sites).  Once  the  helicopter  reaches  a  point  a  fixed  distance  directly  over  the  candidate 
site,  the  human  operator  may  approve  an  autonomous  landing,  at  which  time  the  planner 
directs  a  constant- speed  vertical  descent  to  a  fixed  lower  altitude  AGL,  followed  by  a  slower 
constant-speed  vertical  descent  to  the  ground. 
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2.7  Conclusion 


In  this  chapter,  we  have  given  an  overview  of  the  vision  system  used  in  the  autonomous 
mapping  and  landing  project.  We  will  give  more  details  of  the  crucial  pieces  in  later  chap¬ 
ters:  the  Recursive  Multi-Frame  Planar  Parallax  algorithm  in  Chapter  3,  and  the  motion 
stamping  problem  in  Chapters  4  and  5.  We  will  give  results  from  the  RMFPP  algorithm, 
both  alone  and  as  part  of  the  system,  in  Chapter  7. 
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Chapter  3 


The  Recursive  Multi-Frame  Planar 
Parallax  Algorithm 


The  content  of  this  chapter  is  based  on  published  work  coauthored  with  Christopher  Geyer, 
Marci  Meingast,  and  Shankar  Sastry.  This  material  is  used  with  the  permission  of  the 
coauthors.  ©2006  IEEE.  Reprinted,  with  permission,  from  [1  ]. 
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3.A  World  Coordinates 


Combining  the  equation  for  height  above  the  reference  plane  in  the  frame  of  the  reference 
camera  and  the  definition  of  7,  the  point  p  =  (x.  y )  in  the  reference  image  has  z  coordinate 
in  the  frame  of  the  reference  camera 


z  = 


d\ 

NtX' -7’ 


(3.20) 


where  X'  =  K  1 7 r*  (p).  Back-projecting  into  3-dimensional  space,  the  point  X  and  its 
covariance  in  the  frame  of  the  reference  camera  cov  (X)  are  given  by: 


x  =  d\  X' 

NtX'  -7 

(3.21) 

cov  (X)  =  J  var  (7)  J7 

(3.22) 

where  var  (7)  =  — 

Zj/± 

j=  dl  2X'. 

/  htT  \  2 

(3.23) 

(NtX'-^) 


To  construct  an  elevation  map  over  multiple  reference  frames,  the  point  and  its  covariance 
can  be  transformed  into  the  world  coordinate  system  using  the  known  location  and  orien¬ 
tation  of  the  reference  camera. 
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Chapter  4 


Probabilistic  Motion  Stamping 

4.1  Introduction 

This  chapter  describes  a  motion  filter  that  uses  both  (noisy)  feature  image  locations  from  a 
feature  tracker  and  (noisy)  attitude  and  position  measurements,  which  makes  it  well-suited 
for  an  Unmanned  Aerial  Vehicle  (UAV)  that  is  equipped  with  a  camera  and  a  GPS/INS 
unit.  It  is  designed  for  tight  integration  with  a  feature  tracker  and  a  map  builder  using  the 
Recursive  Multi-Frame  Planar  Parallax  algorithm.  It  provides  an  estimate  of  the  current 
attitude  and  position  of  the  camera  (motion  stamp),  relative  to  a  fixed  world  coordinate 
frame,  for  each  captured  image;  this  estimate  is  essential  to  the  operation  of  the  map  builder. 
It  can  also  predict  the  locations  of  the  previous  features  in  the  current  frame,  in  order  to 
limit  the  feature  tracker’s  search  space,  as  it  evolves  forward  by  an  arbitrary  time  step  to 
the  arrival  time  of  the  current  frame. 


4.2  Requirements 

In  these  systems,  image  and  GPS/INS  data  may  not  arrive  simultaneously  or  with  the  same 
frequencies,  or  even  with  consistent  frequencies.  It  is  therefore  necessary  to  process  the 
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two  types  of  measurements  separately  while  still  combining  them  into  a  single  estimate  of 
the  underlying  system  state.  The  state  belief  must  be  able  to  evolve  forward  by  a  variable 
amount  of  time  to  accommodate  various  time  differences  between  measurements.  It  is 
highly  desirable  to  provide  predictions  of  the  locations  of  the  currently-tracked  features 
in  the  current  frame  so  that  the  feature  tracker  can  limit  its  search  space  for  each  feature, 
shortening  its  runtime  and  increasing  its  accuracy.  The  filter  also  must  provide  an  estimate 
of  the  current  position  of  the  camera  relative  to  a  fixed  coordinate  frame  for  the  map-builder. 


4.3  Notation 


The  notation  used  in  this  chapter  is  relatively  standard  but  it  is  defined  here  for  complete¬ 
ness. 

Let  the  function  7 r  represent  the  projection  of  a  pinhole  camera,  namely 


l 

\ 

X 

u 

=  7 r 

y 

V 

L  - 

V 

z 

/ 

/ 


x 


(4.1) 


where 


x  y  z 


is  a  point  in  3-dimensional  Euclidean  space, 


U  V 


is  the  point’s 


projection  onto  the  image  plane,  and  /  is  the  focal  length  of  the  camera.  The  resulting  vec¬ 


tor 


u  v 


is  used  interchangeably  with  its  homogeneous  representation 


u  v  1 


to  write  expressions  such  as  A y0  where  A  is  a  matrix  with  3  columns  and  y0  is  the  projec¬ 
tion  of  a  point  in  space  onto  the  image  plane.  Similarly,  to  find  the  perpendicular  distance 

r  1 T 

between  a  point  in  space  and  the  camera’s  focal  point,  p  =  Z  =  el  \  x  y  z 

This  chapter  uses  the  axis-angle  parameterization  of  rotations  to  represent  angular  ve¬ 
locities.  Define  the  skew-symmetric  matrix  u>  in  so(3)  corresponding  to  the  vector  u>  = 
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00  l  CU2  OJ3 


by 


Co  = 


0 

LO3 

—lo2 


—UJ3 

0 

00  1 


Co>2 

-CUi 

0 


(4.2) 


The  matrix  exponential  of  0  is  a  rotation  matrix  in  50(3)  and  is  easily  computed  using 
Rodrigues’  formula 


c 0 


=  e“  =  |  +  sin(||u>||)  + 


LO¬ 


CO 


00 


(1  —  cos(||cu| 


(4.3) 


The  inverse  of  Rodrigues’  formula  is  denoted  by  the  function  JO1 2  =  Logso(3)-  Note  that 

in  the  case  of  the  system  state  £  is  the  state  estimate  not  a  skew-symmetric  matrix. 

Although  the  axis-angle  parameterization  is  convenient  because  it  requires  only  three 

parameters  to  represent  any  rotation  and  because  it  makes  it  simple  to  scale  the  magnitude 

of  the  corresponding  rotation,  it  suffers  from  a  singularity  in  its  Jacobian.  Therefore,  this 

chapter  uses  quaternions,  which  do  not  suffer  from  such  a  singularity,  to  represent  attitudes, 
r  iT 


Let  q 


q  1  q2  <?4 


be  a  quaternion.  A  unit  quaternion  equivalent  to  q  is 


Qw 

q\ 

Qx 

1 

q2 

% 

~  Nl 

qs 

_  Qz  _ 

q\ 

and  the  corresponding  rotation  matrix  is 


R(q) 


1  -  H  H 

2  qxqy  +  2  qzqw 
2qxqz  ^qyqw 


2 qxqy  ~  2 qzqw 
1  -  H.  -  2  ql 
2 qyqz  T  2qxqw 


2  qxqz  +  2  qyqw 
2 qyqz  ~  2 qxqw 


(4.4) 
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(4.5) 
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Figure  4.1:  Model  coordinate  frames  and  transformations. 


The  inverse  of  this  function  results  in  a  unit  quaternion  and  is  denoted  by  i?_1.  Note  that 
the  corresponding  functions  for  the  axis-angle  parameterization  are  also  denoted  by  R  and 
R-1;  the  appropriate  one  to  use  is  determined  by  whether  the  argument,  in  the  case  of  R, 
or  the  result,  in  the  case  of  R^1,  is  an  angular  velocity  in  the  axis-angle  representation  or 
an  attitude  in  the  quaternion  representation. 


4.4  System  Model 

The  motion  of  the  system  is  defined  with  reference  frames  and  transformations  as  illustrated 
in  Figure  4.1.  f2,  T,  uj,  and  V  are  the  attitude,  position,  angular  velocity,  and  linear  velocity 
of  the  camera  with  respect  to  its  current  configuration.  flrei  and  Trei  are  the  attitude  and 
position  of  the  camera  with  respect  to  the  GPS/INS  unit.  flwr  and  Twr  are  the  attitude  and 
position  of  the  vehicle  (as  measured  at  the  GPS/INS  unit)  with  respect  to  the  global  GPS 
zero.  The  motion  of  the  vehicle  is  modeled  as  Brownian  motion  in  uj  and  V. 

The  location  of  the  ith  feature  is  represented  by  its  projection  yl0  onto  the  initial  image 
plane  and  by  its  distance  p1  from  the  focal  point  of  the  initial  camera.  The  focal  length  /  of 
the  camera  is  also  included  in  the  model  and  is  modeled  as  a  random  walk  if  it  can  change 
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over  time. 


The  evolution  of  the  model  is  given  by 


y'o{t  +  At)  =  y*0(t)  i  —  1, . . 

;N  Vo(°)  =  Vo 

(4.6) 

pl{t  +  At)  =  p\t)  i  —  1,.. 

.,N  p’(0)=p* 

(4.7) 

fl(t  +  At)  =  R-\R(cj(t)  At)  R(n(t))) 

n(o)  =  R-\\) 

(4.8) 

T(t  +  At)  =  R(w(t)  At)  T(t)  +  V{t)  At 

T(0)  =  0 

(4.9) 

u )(t  +  At)  =  oj(t)  +  aw(t) 

w(0)  =  lvq 

(4.10) 

V{t  +  At)  =  V{t)  +  av{t) 

V(0)  =  V0 

(4.11) 

£2  writ  +  At)  =  O  wr(t) 

£2Wr(0)  ^wrO 

(4.12) 

Twr(t  +  At)  =  Twr(t ) 

Twr(0)  —  Twro 

(4.13) 

arel{t  +  At)  =  nrel(t) 

flreli  0)  =  ^relO 

(4.14) 

T  rei(t  +  At)  =  T  rei{t) 

T Ve/(0)  =  Trel  0 

(4.15) 

f(t  +  At)  =  f(t)+af(t) 

/( 0)  =  /o 

(4.16) 

where  av(f)  and  aw(t)  represent  the  Brownian  motion  in  the  velocities  of  the  vehicle  and 
otf(t)  represents  a  random  walk  in  /. 

The  observable  measurements  of  the  system  are  the  projection  y^it)  of  the  ith  feature 
onto  the  current  image  plane  (measured  by  the  feature  tracker)  and  the  attitude  f lw(t)  and 
position  Tw(t)  of  vehicle  with  respect  to  the  global  GPS  zero  (measured  by  the  GPS/INS 
unit).  These  measurements  are  given  by 

y\t)  =  7T(R(ft(t))y})(t)p\t)  +  T(t))+nl(t),x  =  l,...,N  (4.17) 

aw(t)  =  i?-1(i?(0^(f))i?(0rd(f))i?(0(f))-1i?(0rei(i))_1)+na-(f)  (4.18) 

Tw{t)  =  R(flwr(t))(T: relit)  -  R(rtrei(t))  i?(i7(f))_1  A (t))  +  Twr(t)  +  nTw{t) , 

where  A(t)  A  ^(^(t))”1  Trel(t)  +  T(t)  (4.19) 
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Measurement 


Predicted  measurement 


Evolution: 

(1)  Linearize  state  transition  model 
about  posterior  state 

(2)  Project  state  ahead  (prior  state) 
and  predict  measurement 

(3)  Project  error  covariance  ahead 
(prior  covariance) 


(1)  Copy  prior  state  to  posterior  state 

(2)  Predict  measurement  and  linearize 
measurement  model  about  posterior 
state 

(3)  Update  state  based  on  difference 
between  measurement  and  predicted 
measurement  (posterior  state) 

(4)  Update  error  covariance  (posterior 
covariance) 

(5)  Repeat  2-4  until  convergence  or  until 
do  maximum  number  of  iterations 


Initial  estimate 


Figure  4.2:  Iterated  extended  Kalman  filter. 


where  nl(t),  n{lu:  (t),  and  nTw(t )  are  additive  Gaussian  measurement  noise. 

This  is  an  extension  of  the  model  in  [29]  to  include  attitude  and  position  measurements 
(f lwr,  Twr.  ft-rei,  Trei,  ftw,  Tw)  and  to  allow  evolution  by  an  arbitrary  nonnegative  At. 
It  also  eliminates  the  special  features  used  for  scale  in  [29]  and  instead  makes  the  system 
observable  using  attitude  and  position  measurements. 


4.5  Implementation 

The  filter  is  implemented  as  an  iterated  extended  Kalman  filter  in  approximately  4000  lines 
of  C++  code.  The  extended  Kalman  filter  is  an  extension  of  the  extended  Kalman  filter 
that  allows  multiple  innovation  iterations  per  evolution  to  achieve  better  convergence  of 
the  nonlinear  system.  Figure  4.2  illustrates  the  operation  of  the  iterated  extended  Kalman 
filter,  where  in  this  case  measurements  are  predicted  both  in  the  evolution  step  (to  narrow 
the  search  space  of  the  feature  tracker)  and  in  the  innovation  step  (to  compare  to  the  actual 
measurements).  For  full  descriptions  and  derivations  of  the  extended  Kalman  filter  and  the 
iterated  extended  Kalman  filter,  see  [30]  and  [31]. 
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4.5.1  Measurements 


The  fact  that  feature  and  GPS/INS  measurements  are  available  at  different  times  and  with 
different  frequencies  requires  that  the  two  sets  of  measurements  cannot  be  evolved  and 
innovated  together.  Instead,  there  is  one  set  of  evolution  and  innovation  routines  for  feature 
measurements  and  another  for  GPS/INS  measurements.  These  routines  must  be  run  in 
pairs-a  feature  innovation  must  follow  a  feature  evolution  and  an  attitude  and  position 
innovation  must  follow  an  attitude  and  position  evolution.  Separating  the  evolution  from 
the  innovation  allows  the  feature  tracker  to  receive  a  new  frame,  evolve  the  filter  to  the 
time  when  the  frame  arrived  and  get  predicted  feature  locations  for  that  frame,  locate  the 
features  using  the  predictions  to  limit  its  search,  and  finally  innovate  the  filter  using  the 
observed  projections  of  the  features  onto  the  image  plane. 

4.5.2  Appearing  and  Disappearing  Features 

Because  features  appear  and  disappear  due  to  motion,  nonvisible  features  must  be  removed 
from  the  model  and  new  features  must  be  added.  For  reasons  of  practicality,  a  feature 
that  disappears  and  then  later  reappears  is  treated  as  two  different  features.  Removing  a 
feature  is  trivial  (it  only  involves  removing  the  feature  from  the  state  estimate),  but  adding 
a  new  feature  is  problematic  because  its  initial  distance  from  the  focal  point  of  the  camera 
is  unknown.  To  get  around  this  problem,  as  in  [29]  each  new  feature  is  initially  assigned 
an  arbitrary  distance  pl  with  large  variance  and  placed  in  its  own  subfilter  so  that  its  large 
distance  variance  does  not  affect  the  main  filter.  A  new  feature  is  inserted  into  the  main 
filter  from  its  subfilter  when  its  depth  variance  becomes  comparable  to  the  depth  variance 
of  the  features  in  the  main  filter  (implemented  as  when  its  depth  variance  becomes  less 
than  or  equal  to  the  mean  depth  variance  of  the  features  in  the  main  filter  or  when  the  depth 
variance  becomes  less  than  or  equal  to  a  fixed  threshold). 
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4.5.3  Initialization 


The  main  filter  state 


£  = 


IT  1 

Vo  P 


y%T  pn  n3 


UT 


v 1  n 


a 


rel 


rel 


f 


(4.20) 


is  initialized  using  the  measurements  yl( 0)  for  ;y('J(0);  arbitrary  distances  for  pl ;  Ii  ~ 1  (  I ) 
for  f2(0);  0  for  T(0),  u>(0),  and  V(0);  the  most  recent  GPS/INS  reading  for  fTr(0)  and 
Twr( 0);  a  given  estimate  of  Ore/  and  Trel  for  f2rez(0)  and  Trfd(())  (determined  by  the  ve¬ 
hicle);  and  a  given  estimate  of  /  for  /( 0)  (determined  by  the  camera).  The  main  filter 
covariance  P  is  initialized  to  be  block  diagonal  with  the  covariance  of  the  blocks  corre¬ 
sponding  to  the  y'{)  determined  by  analysis  of  the  feature  tracker,  a  relatively  large  number 
for  the  blocks  corresponding  to  the  p\  0  for  the  blocks  corresponding  to  il  and  T,  a  rel¬ 
atively  large  number  on  the  diagonal  for  the  blocks  corresponding  to  u>  and  V,  a  block 
corresponding  to  f 2wr  and  Twr  that  is  given  by  analysis  of  GPS/INS  measurements,  a  rel¬ 
atively  small  number  on  the  diagonal  for  the  blocks  corresponding  to  Qrei  and  T red,  and  a 
portion  such  as  0.1  of  /( 0)  as  the  initial  variance  of  /. 

No  subfilters  are  created  during  initialization,  and  no  features  are  allowed  to  move  from 
a  subfilter  to  the  main  filter  until  after  the  first  10  sets  of  feature  evolutions  and  innovations 
(of  the  main  filter)  to  allow  the  main  filter  to  stabilize. 


4.5.4  Main  Filter 

The  system  model  in  equations  4.6-4.19  above  can  be  written  in  summary  form  as 


£(t  +  At)  =  f{€(t))  +  w(t)  w(t)  ~  AA(0,  T.w)  (4.21) 


Vfeat(t) 

hfeat(£{t)) 

+  n(t)  n(t )  ~  JV(0,  Tn) 

(4.22) 

= 

ybodyi. t) 

hbody  (t)  ) 
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where  yfeat(t)  is  the  y\t)  and  ybody{t)  is  Vtw{t)  and  Tw(t). 

As  in  [29],  modified  to  iterate  the  innovation  (IEKF)  as  in  [31],  the  main  filter  imple¬ 
ments  the  following  equations: 

Evolution: 


£(t  +  At|t)  =  /(£(t|t)) 


(4.23) 


P(t  +  At\t)  =  F(t)  P(t|t)  PT{t)  +  Tw 


(4.24) 


Innovation: 


£(t  +  At|t  +  At)  =  l(t  +  At\t)  +  L (t  +  At)  (y(t  +  At)  -  h  (£(t  +  At |t))  ) 

-H  (£(f  +  At\t)  -  £(t  +  At\t  +  At))  (4.25) 

P(t  +  At|t  +  At)  =  T(t  +  At)  P(t  +  At |t)  rT(t  +  At) 

+L(t  +  At)  Zn(t  +  At)  LT(t  +  At)  (4.26) 

For  the  first  iteration  of  the  innovation,  £(t  +  At  |t  +  At)  is  set  equal  to  £(t  +  At|t). 

Gain: 


r(t  +  At)  =  I  -  L(t  +  At)  H(t  +  At)  (4.27) 

L(t  +  At)  =  P(t  +  At 1 1)  HT(t  +  At)  A _1(t  +  At)  (4.28) 

A(t  +  At)  =  H(t  +  At)  P(t  +  At |t)  H T(t  +  At)  +  En(t  +  At)  (4.29) 

Linearization: 

FW  =  ^(«(«l*))  (4.30) 

O  7 

H(t  +  At)  =  T^-(C(t  +  At|t  +  At))  (4.31) 
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Note  that  /  and  y  in  the  above  equations  represent  ffeat(t )  and  Vfeatif)  when  a  fea¬ 
ture  measurement  is  available,  and  fbody(t)  and  ybody(t)  when  a  GPS/INS  measurement  is 
available. 


4.5.5  Subfilters 


-l  T 


y'r  P\ 


A  subfilter  is  created  when  a  new  feature  appears  at  time  r  and  its  state  = 
is  initialized  by  its  projection  onto  the  current  image  plane  (y  ‘T)  and  an  arbitrary  depth  (plT). 
Its  initial  covariance  Pr  is  block  diagonal  with  the  block  corresponding  to  y  ‘T  determined 
by  analysis  of  the  feature  tracker,  and  with  the  variance  of  p\  a  relatively  large  number. 
Each  subfilter  also  stores  (but  never  changes  except  during  recentering)  the  attitude  and 
position  estimate  from  the  main  filter  for  the  time  at  which  the  subfilter  is  initialized.  These 
stored  estimates  of  fl  and  T  at  time  r  are  f2(r|r)  and  T(r|r). 

As  in  [29],  a  sub  filter  implements  the  following  equations: 

Measurement: 


y\t)  =  v(R(Q(t\t))  (yi(t)  (?T{t)  -  T(r|r))  +  T(t\t))  +  n\t)  (4.32) 


where  nl(t)  is  additive  Gaussian  measurement  noise. 
Evolution: 


(4.33) 


P  T(t  +  At\t)  =  PT(t\t)  +  Tw(t) 


(4.34) 


for  t  >  t. 

The  subfilter  innovation  equations  are  identical  to  the  main  filter  innovation  equations 
using  the  above  subfilter  measurement  equation. 

When  the  variance  of  p\  decreases  to  less  than  or  equal  to  the  mean  distance  variance 
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of  all  features  in  the  main  filter  or  when  the  variance  of  plT  decreases  to  less  than  or  equal  to 
a  fixed  threshold,  the  subfilter  is  removed,  the  feature  is  projected  back  into  the  coordinate 
frame  of  the  initial  camera  using 


X-  =  JS(-n(r|r))  -1  (y\ (<)  pUt)  -  T(r |t)) 

(4.35) 

Hot*)  =  *(X‘) 

(4.36) 

A{t)  =  eT3X\ 

(4.37) 

and  the  feature  state  and  covariance  are  inserted  into  the  main  filter.  Initially  the  covariance 
of  the  feature  with  the  rest  of  the  main  filter  state  is  zero. 

Subfilters  are  not  evolved  or  innovated  for  (GPS/INS)  attitude  and  position  measure¬ 
ment  data  because  they  do  not  include  any  reference  frame  state. 

4.5.6  Recentering 

To  reduce  feature  uncertainty  in  the  main  filter,  we  recenter  the  reference  camera  coordinate 
system  when  we  have  no  features  from  the  image  captured  at  the  reference  camera  location 
remaining  in  the  main  filter,  subject  to  a  given  minimum  number  of  feature  innovations 
between  recenterings. 
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We  first  transform  the  main  filter  into  the  new  coordinate  system: 


vim'  =  n(R(n(t\t))  vim  + rm) ,  *  =  1, . . . ,  n  (4.38) 

p\t\t)'  =  el  (. R(Sl(t\t ))  yim  p\t\t)  +  T(t\t)) ,  i  —  1, . . . ,  N  (4.39) 

Q(t|t)'  =  R-\\)  (4.40) 

T(t|t)'  =  0  (4.41) 

nwr(t\t)'  =  i?-1(i?(Ow(f|f))i?(fire/(f|f))i?(0(f|f))-1i?(Ore/(f|f))“1)  (4.42) 

Turrit)1  =  R{Qwr(t\t))(Trei(t\t)  -  R(Qrei(t\t))  R(Q(t\t)yl  A(t|t))  +Twr(t\t), 
where  A(t|t)  4  i?(firei(t (f))"1  Trd(f|f)  +  T(f|f) .  (4.43) 


The  covariance  of  the  main  filter  is  updated  using  the  Jacobian  of  this  transformation. 

We  must  also  correct  the  camera  attitude  and  position  estimate  with  which  each  subfilter 
was  initialized  by  the  same  change  in  coordinate  system: 

0(r|r)'  =  R"1  (R{Q(t\t))  R(Q(t |f))_1)  (4.44) 

T(t\t)'  =  T(t\t)  -  i?(0(T|r))  f?(f2(t|f))_1  T(t\t) .  (4.45) 

Note  that  no  covariance  needs  to  be  updated  for  this  transformation,  since  the  covariance 
of  f2(r|r)  and  T(t|t)  is  not  maintained  in  the  subfilter. 

4.5.7  Tuning 

Zw  in  the  equations  above  is  a  tuning  parameter.  It  is  chosen  to  be  block  diagonal,  with 
zeros  corresponding  to  f 2(f)  and  T(t)  so  that  the  system  is  a  perfect  integrator.  Hw  allows 
control  over  the  smoothness  of  the  estimate  and  depends  on  the  amount  of  noise  in  the 
measurement  data.  In  practice,  Tw  is  chosen  for  a  unit  evolution  At  =  1  and  is  scaled  by 
the  current  At  during  the  evolution. 
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Real  and  Estimated  3D  Position  Over  Time 


Figure  4.3:  Synthetic  experiment:  screw  trajectory  (noiseless  trajectory  blue,  filtered  tra¬ 
jectory  red). 


True  and  Predicted  Features,  First  Frame 


0.8 

0.6 

0.4 

0.2 

>•  0 

-0.2 

-0.4 

-0.6 

-0.8 

-1 


%  %  * 

***%*#  * 
#  *  *  * 

1  *%*  ** 

*  * 

*  **  *  * 

-*%**  *  * 


*  #. 
*  *  k  i 


* 


-0.8  -0.6  -0.4  -0.2 


0.2  0.4  0.6  0.8  1 


Figure  4.4:  Synthetic  experiment:  first  frame  (noiseless  features  blue,  filtered  features  red). 
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Figure  4.5:  Synthetic  experiment:  last  frame  (noiseless  features  blue,  filtered  features  red). 

4.6  Results 

Figures  4. 3-4. 5  show  the  result  of  running  the  filter  on  an  interleaved  sequence  of  50  fea¬ 
ture  measurements  and  50  GPS/INS  measurements.  The  data  is  synthetic,  which  allows  a 
comparison  to  ground  truth,  with  the  input  to  the  filter  corrupted  by  additive  Gaussian  noise 
with  standard  deviation  0.01.  Figure  4.3  shows  the  vehicle  trajectory,  a  shallow  screw  mo¬ 
tion  in  three  dimensions.  There  are  85  features  per  frame,  and  the  input  is  constructed  such 
that  all  features  remain  in  the  camera’s  field  of  view  for  the  entire  motion.  Figures  4.4  and 
4.5  show  the  camera  image  plane  with  the  noiseless  (before  corruption)  features  in  blue 
and  the  filtered  features  (the  output  of  the  filter)  in  red.  Figure  4.4  shows  the  first  usable 
frame  in  the  sequence  (the  first  frame  is  used  for  initialization)  and  Figure  4.5  shows  the 
last  frame  in  the  sequence.  Focal  length  filtering  is  not  used  for  this  sequence  because  the 
focal  length  is  not  changing  over  time. 

Although  the  filtered  trajectory  in  Figure  4.3  is  not  as  smooth  as  the  noiseless  trajectory, 
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it  does  a  reasonable  job  of  following  the  motion  considering  that  the  input  attitude  and 
position  are  corrupted  by  noise  with  standard  deviation  0.01.  From  Figures  4.4  and  4.5, 
the  results  appear  to  be  particularly  sensitive  to  radial  noise,  which  is  not  surprising  since 
radial  noise  is  highly  nonlinear  and  the  filter  optimizes  based  on  repeated  linearizations. 
Overall,  the  results  are  acceptable  but  not  ideal. 

4.7  Conclusion 

This  chapter  has  described  a  filter  that  integrates  feature  measurements  from  a  feature 
tracker  with  attitude  and  position  measurements  from  an  onboard  GPS/INS  unit,  taking 
into  account  all  of  the  necessary  coordinate  frames  and  the  transformations  between  them. 
The  filter  is  also  able  to  evolve  the  state  by  an  arbitrary  nonnegative  time  and  to  predict 
the  locations  of  features  in  the  current  frame.  This  has  yielded  a  filter  with  acceptable  per¬ 
formance,  although  it  is  not  yet  suitably  accurate  for  the  adverse  conditions  of  the  aerial 
mapping  and  landing  problem. 
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Chapter  5 


Analytical  Motion  Stamping 

5.1  Introduction 

After  experimenting  with  the  probabilistic  method  discussed  in  the  previous  chapter,  we 
described  an  analytical  approach.  Here  we  outline,  and  analyze  the  robustness  of,  this 
analytical  motion-stamping  algorithm. 

The  method  discussed  in  this  chapter  assumes  that  the  intrinsic  calibration,  i.e.  the  focal 
length  and  center  of  projection  (and  other  parameters  such  as  skew  and  radial  distortion  if 
they  are  necessary),  of  the  camera  is  known.  It  also  assumes  that  all  sensors  are  colocated 
in  both  location  and  orientation;  more  realistically,  it  assumes  that  all  sensors  have  been 
calibrated  and  that  their  measurements  have  been  corrected  to  a  common  reference  frame. 
These  calibration  algorithms  are  discussed  in  Section  6.4. 

The  remainder  of  this  chapter  is  organized  as  follows:  We  begin  by  describing  the 
analytical  motion  stamping  algorithm  in  Section  5.2.  Section  5.3  provides  a  sensitivity 
analysis  and  Section  5.4  discusses  results  on  realistic  synthetic  data.  Finally,  Section  5.5 
concludes. 
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5.2  Algorithm 


To  fulfill  the  RMFPP  assumption  that  the  motion  is  known,  we  must  motion  stamp  each  im¬ 
age  (determine  the  world  location  and  orientation  of  the  camera  when  it  was  captured).  Al¬ 
though  we  have  location  and  orientation  measurements  available  from  helicopter  GPS/INS 
sensors,  they  are  noisy  (particularly  the  location  measurements,  which  are  based  on  GPS) 
and  do  not  in  general  correspond  to  times  when  images  are  captured.  They  do,  however, 
provide  absolute  measurements  of  motion  that,  intuitively,  we  can  use  to  correct  for  drift 
due  to  integrating  the  relative  motion  information  provided  by  the  camera  over  time. 

Because  the  helicopter  GPS/INS  sensors  are  noisy,  we  (sparsely)  track  distinctive  fea¬ 
tures  between  images  and  use  these  tracks  to  improve  the  accuracy  of  the  online  motion 
estimation.  Although  feature  tracks  do  not  give  full  orientation  and  location  information, 
they  do  give  very  accurate  two-dimensional  projections  of  relative  orientation  and  location. 
We  also  assume  that  the  helicopter  motion  is  smooth.  Note  that  the  system  must  run  in  real 
time,  but  that  it  may  delay  all  frames  by  a  constant  amount  of  time  to  use  ‘future’  infor¬ 
mation  to  estimate  the  location  and  orientation  of  the  camera  when  an  image  was  captured; 
this  only  results  in  a  landing  site  being  detected  slightly  later  than  it  otherwise  would  be. 

The  online  motion  estimation  algorithm  first  computes  an  initial  value  for  the  camera 
orientation,  then  combines  a  polynomial  fit  on  the  helicopter  location  measurements  with 
information  from  feature  tracks  using  least  squares  optimization,  and  finishes  with  a  non¬ 
linear  refinement. 

5.2.1  Orientation 

Because  the  orientation  measurements  given  by  the  helicopter  sensors  are  relatively  noise¬ 
less,  because  the  relative  orientation  of  the  camera  is  independent  of  its  relative  position, 
and  because  the  location  estimation  requires  it,  the  camera  orientation  is  estimated  first. 
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The  orientation  estimate  is  the  R  that  minimizes 


y^W;||R  -  R|||f  (5.1) 

i 

as  in  [32],  where  the  Rj  are  preliminary  estimates  of  R  from  feature  tracks  and  from  inter¬ 
polation  of  calibrated  GPS/INS  vehicle  orientation  measurements,  and  the  Wi  are  weights 
that  decrease  with  increasing  time  between  when  the  current  image  was  captured  and  when 
the  data  used  to  estimate  Rj  was  obtained  (they  can  also  adjust  the  overall  weight  between 
the  two  sources  of  Ri  estimates).  The  minimum  value  is  achieved  when  R  is  the  polar  factor 
of  the  polar  decomposition  [33]  of 


Q  =  u^Rj  (5.2) 

i 

provided  that  Q  has  positive  determinant  [32]. 

Each  feature  track  R;  is  obtained  by  estimating  an  essential  matrix  E  (assuming  that  the 
scene  is  nonplanar)  and  a  homography  H  (assuming  that  the  scene  is  planar)  between  the 
current  image  and  a  past  image,  computing  ARj  from  whichever  of  E  and  H  is  a  better  fit, 
and  combining  ARj  with  the  final  orientation  estimate  of  the  past  image.  The  procedures 
for  estimating  E  and  H  from  feature  correspondences  and  for  computing  ARj  using  these 
estimates  are  beyond  the  scope  of  this  report;  see  [34]  for  details. 

Each  GPS/INS  Rj  is  obtained  by  interpolating  calibrated  vehicle  orientation  measure¬ 
ments  from  before  and  after  the  current  image  was  captured: 


Ri  =  (Ri2Riir)7 


t-tji 

«-*«■  Rji 


(5.3) 


where  t  is  the  time  of  the  current  frame,  RjX  and  Ri2  are  GPS/INS  orientation  measurements 
at  times  tn  and  ti2,  and  tn  <  t  <  ti2. 
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5.2.2  Location 


The  camera  location  estimate  is  a  least  squares  estimate  that  combines  a  weighted  polyno¬ 
mial  fit  of  the  calibrated  vehicle  locations  with  additional  equations  based  on  feature  cor¬ 
respondences.  Let  t  be  the  time  of  the  image  that  we  are  currently  trying  to  motion  stamp, 
let  {(tj,  Tj)}  be  calibrated  vehicle  location  measurements  both  before  and  after  time  t,  and 
define  A L  =  t ,  —  t.  To  calculate  a  k-th  order  polynomial  fit  on  the  calibrated  vehicle 
location  measurements,  we  require  linear  equations  of  the  form 


«ifc 


A ti 
0 
0 


At}  1  0  •  •  •  0  0  0 

0  0  A t\  ■■■  At}  1  0 

0  0  0  •••  0  0  At} 


0  0 
0  0 
At}  1 


an 
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=  Ti  (5.4) 


«31 

«30 


where  apq  is  the  coefficient  of  the  gth  order  term  in  the  polynomial  fit  for  the  p-th  dimension 

of  the  location.  In  practice  we  normalize  each  row  of  this  equation  by  the  norm  of  its  row 

in  the  left-hand  matrix  or  by  a  small  constant,  whichever  is  larger,  and  scale  by  exp  to 

regularize  the  system  and  then  give  relatively  more  weight  to  measurements  that  are  closer 

in  time  to  t.  Note  that  this  fit  is  constructed  so  that  the  camera  location  at  time  t  (the  location 

r  iT 


of  interest)  is  trivially  extracted  from  the  parameter  vector  by  T 


«10  «20  a30 
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Consider  a  feature  i  that  is  visible  in  the  current  image  and  in  a  past  image  j  that  has 
been  previously  motion  stamped.  Let  Xi  =  Xi  y,  1  and  Xij  =  Xij  1 
be  the  calibrated  homogeneous  locations  of  the  feature  in  the  current  and  j-th  images  re¬ 
spectively.  The  locations  of  the  feature  in  the  three-dimensional  coordinate  frames  of  the 
current  and  j-th  cameras  are  A,x'7;  and  A ijXij  respectively  for  some  A,  and  Xir  Define  the 
relative  orientation  and  location  of  the  current  and  j-th  camera  reference  frames  by  Rd.  Td 
such  that  Rd  (A jXi)  +  Td  =  XVJXij.  Then  Rd  =  Rj7  R  and  Td  =  Rj7  (T  —  Tj)  where  R.  T 
is  the  orientation  and  location  of  the  current  frame  and  Rj,  Tj  is  the  orientation  and  location 
of  the  j-th  frame.  We  can  eliminate  one  unknown  from  Rd  (X,Xi)  +  Td  =  XijXij  by  taking 
the  cross  product  with  x^  on  both  sides;  then  A;  ( Xij  x  Rda;.j)  +  x^  x  Td  =  0.  Substitut¬ 
ing  for  Rd,  Td  gives  Aj  ( x^  x  Rj2  Rcc*)  +  x^  x  Rj2  T  =  x^  x  Rj2  Tj.  Note  that  the  only 
unknowns  are  T  and  A  j.  We  can  write  this  equation  as 

OL 10 

m  ml  ^20  rp 

Xij  x  Rjr  Xij  x  RjTR*i  =  Xij  x  Rj  Tj  (5.5) 

-I  a30 

.  A*  .. 

r  i  t 

where  we  use  the  identity  T  =  ai0  a20  Q.:;j0  .  Note  that  the  three  equations  given 

above  are  not  independent,  so  we  only  use  the  first  two  rows.  We  normalize  each  row 
by  the  norm  of  its  row  in  the  left-hand  matrix  and  then  scale  by  p  ^1  —  exp  \  j ,  where 
At  =  tj  —  t  and  tj  is  the  time  of  the  j-th  frame;  this  regularizes  the  system,  weights 
equations  for  frames  that  are  farther  apart  in  time  more  heavily,  and  weights  the  feature 
error  relative  to  the  polynomial  fit  error. 

By  stacking  many  equations  corresponding  to  calibrated  vehicle  locations  (polynomial 
fit)  and  many  equations  corresponding  to  feature  pairs,  we  obtain  a  least  squares  problem 
in  {apq}  and  {A,}.  In  practice  we  use  all  helicopter  location  measurements  between  Mp 
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frames  before  the  current  frame  and  Mj  frames  after  the  current  frame  (we  delay  RMFPP 
reconstruction  by  Mf  frames).  We  use  the  N  features  in  the  current  frame  that  have  been 
tracked  the  longest  and  include  equations  of  the  above  form  for  all  of  the  past  Mp  images 
in  which  they  appear  (this  allows  a  more  accurate  estimate  of  each  A*  and  also  allows  us 
to  introduce  the  fewest  additional  variables  for  a  given  number  of  feature  correspondence 
equations).  Note  that  A  jo;*  gives  the  location  of  the  i-th  feature  in  the  reference  frame  of 
the  current  camera;  using  these  points  and  the  motion  stamp  for  the  current  image,  we  can 
estimate  the  average  height  of  the  terrain  for  the  RMFPP  reference  plane. 

5.2.3  Nonlinear  Refinement 

Since  R  and  T  have  been  estimated  separately,  we  perform  a  final  nonlinear  optimiza¬ 
tion  using  Sparse  Bundle  Adjustment  (SBA)  [13].  This  is  a  local  method  based  on  the 
Levenberg-Marquardt  algorithm  that  minimizes  \\y  —  fix)  |  over  x ,  where  as  is  a  vector 
of  the  locations  of  all  features  in  3D  space  as  well  as  the  camera  location  and  orientation 
for  the  current  image  (note  that  in  general  SBA  can  optimize  over  the  camera  locations 
and  orientations  for  an  arbitrary  number  of  frames),  y  is  a  vector  of  all  of  the  feature  mea¬ 
surements  in  the  last  Mp  images  and  in  the  current  image,  and  the  function  /  projects  the 
3D  features  into  the  last  Mp  images  and  the  current  image  using  all  camera  locations  and 
orientations  and  the  camera  intrinsic  calibration  K.  Note  that  the  positions  and  orientations 
of  the  last  Mp  images  are  held  fixed;  only  the  location  and  orientation  of  the  current  image 
is  allowed  to  vary. 

5.2.4  Remark 

Since  the  first  frame  that  the  motion  estimation  processes  will  not  have  any  previous  motion 
stamped  frames  it  uses  only  GPS/INS  R;  and  performs  only  the  polynomial  fit  on  this  image. 
It  subsequently  uses  this  motion  stamp  (and  then  motion  stamps  derived  from  this  motion 
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stamp)  to  calculate  new  motion  estimates.  However,  it  quickly  converges  to  more  accurate 
results  as  this  initial  estimate  becomes  less  influential.  We  turn  on  the  RMFPP  after  the 
motion  estimation  has  converged. 

5.3  Robustness 

In  this  section  we  examine  the  robustness  of  the  least-squares  problem  of  minimizing 
1 1  Aa?  —  b\  1 2  in  the  location  estimate  above.  In  particular,  we  consider  worst-case  minimiza¬ 
tion  over  additive  noise  to  elements  of  A.  A  comparison  of  maxAeA  1 1  /\xLS  —  b\  |2  (where 
xLS  is  the  least-squares  solution)  and  min,.  maxAe  4  ||Acc  —  £»|  |2  for  different  amounts  of 
noise  p  indicates  the  sensitivity  of  the  solution  to  the  additive  noise  in  A. 

Consider  min.,,  maxAeAp  | Acc  —  b||2  where  Ap  =  {A  +  A|  —  vl3  ( p )  <  8.^  <  Vij  (p)}. 
Then  maxAe^(i  || /KxLS  —  b\ |2  is  equal  to  1 1  \AxLS  —  fc»|  -h V|£cZ/,s' 1 1 \2  and  min^  maxAeAp  1 1 Acc  — 
£>|  1 2  is  equal  to  the  square  root  of  min^y^t  t  s.t.  —  y  A  Ax  —  b  A  y,  —w  A  x  A  w,  and 
|  \y  +  V  (p)  w\  1 2  <  t.  The  latter  is  a  Second-Order  Cone  Problem  (SOCP),  which  we  solve 
using  the  YALMIP  [35]  optimization  framework  and  the  SDPT3  [36]  solver. 

5.3.1  Error  In  Position  Measurement  Time 

We  first  consider  the  robustness  of  the  solution  to  error  in  the  time  at  which  the  GPS/INS 
position  is  measured.  Since  our  vision  system  periodically  receives  this  data  over  an  Eth¬ 
ernet  connection  and  the  clocks  on  the  different  computers  are  not  synchronized,  our  pro¬ 
gram  timestamps  the  data  with  the  time  at  which  it  receives  it  (after  an  unknown  delay  on 
the  other  computer,  transmission  time,  and  thread-  and  process-switching  time).  We  model 
this  by  an  error  in  A C  in  Equation  5.4  of  at  most  p  (and  hence  an  error  in  At?  of  at  most 
pP).  In  general  this  is  an  approximation  that  overestimates  the  sensitivity  of  the  solution, 
but  in  this  case  we  are  using  a  linear  fit  in  Tj  so  there  is  only  one  power  of  At?. 

Figure  5.1  shows  the  mean  and  standard  deviation  of  the  solutions  to  maxAg^,  1 1  AxLS— 
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Sensitivity  To  Error  In  Position  Measurement  Time  Sensitivity  To  Error  In  Position  Measurement  Time  -  Standard  Deviation 


(a) 


(b) 


Figure  5.1:  Mean  (a)  and  standard  deviation  (b)  of  minx||Aa;  —  b ||2  (blue), 
maxAe/ip  \\hxLS  —  b 1 12  (red),  and  minx  maxAeylp  1 1 /\x  —  b\  |2  (green)  for  different  magni¬ 
tudes  of  error  in  the  times  of  the  GPS/INS  position  measurements.  Note  that  minx  |  Ax  — 
b\  1 2  is  independent  of  the  maximum  error  p;  it  is  included  for  comparison. 


fo|  1 2  and  min.,.  maxAgA  |  |Acc  —  b\  |2  for  different  values  of  p  using  20  different  A,  b,  and  xLS 
obtained  by  running  the  motion  stamp  algorithm  on  realistic  synthetic  images  and  GPS/INS 
location  and  orientation  data.  The  two  curves  for  the  mean  are  similar  through  p  rs  0.01, 
so  the  method  is  robust  to  variability  in  delay  up  to  At*  ~  0.01  second. 


5.3.2  Error  In  Feature  Equations 

We  now  consider  the  robustness  of  the  solution  to  error  in  the  initial  orientation  estimate  R, 
the  past  orientation  estimates  Rj,  the  feature  measurements  in  the  current  image  Xi,  and  the 
feature  measurements  in  the  past  images  Xij.  This  is  modeled  by  an  error  of  at  most  p  in  all 
elements  of  A  shown  in  Equation  5.5.  In  general  this  is  an  approximation  that  overestimates 
the  sensitivity  of  the  solution  since  the  same  terms  appear  in  different  elements  of  A  and 
since  R  and  Rj  are  constrained  to  be  rotation  matrices. 

Figure  5.2  shows  the  mean  and  standard  deviation  of  the  solutions  to  maxAGA„  1 1 hxLS— 
b\\2  and  min.,.  rnaxAgA,,  1 1 Arc  —  b||2  for  different  values  of  p  using  the  same  data  as  in  the 
section  above.  There  is  no  dramatic  difference  between  the  two  curves  like  there  was  in 
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(a) 


Sensitivity  To  Error  In  Feature  Equations  -  Standard  Deviation 


maximum  error 

(b) 


Figure  5.2:  Mean  (a)  and  standard  deviation  (b)  of  niinx||Aaj  —  b ||2  (blue), 
maxAeAp  \\hxLS  —  b ||2  (red),  and  mirix  maxAe/tp  ||Aaj  —  b ||2  (green)  for  different  mag¬ 
nitudes  of  error  in  the  feature  equations.  Note  that  rninx  |  A.x  —  b 1 12  is  independent  of  the 
maximum  error  p\  it  is  included  for  comparison. 


the  last  section;  the  solution  is  robust  to  error  in  the  feature  equations.  This  is  due  to 
the  fact  that  the  feature  equations  only  provide  relative  information  (it  is  possible  for  the 
noisy  elements  to  be  consistent  with  the  same  solution),  the  fact  that  there  are  many  such 
equations  providing  similar  information,  and  the  fact  that  the  motion  is  also  ‘constrained’ 
by  the  polynomial  fit. 


5.4  Results 


We  test  the  algorithm  on  a  realistic  synthetic  sequence  of  images  and  GPS/INS  data,  with 


zero-mean  Gaussian  noise  with  standard  deviation  3.0  meters  added  to  T*  and  a  quaternion 

T 

multi¬ 


T 

- 

with  mean 

10  0  0 

and  standard  deviation 

0.01  0.01  0.01  0.01 

plied  onto  R|.  The  images  were  rendered  from  satellite  imagery  and  Digital  Terrain  Eleva¬ 
tion  Data  (DTED)  elevation  data  (see  sample  rendered  image  in  Figure  7.1)  at  2.5  fps  and 
the  synthetic  GPS/INS  data  arrived  at  33.3  Hz.  The  system,  including  feature  tracking  and 
RMFPP,  ran  in  real  time  on  an  Intel  Pentium  M  1.6  GHz  processor. 

Figure  5.3  shows  the  true  (before  noise  was  added)  and  estimated  trajectory,  altitude, 
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(a)  (b) 


Figure  5.3:  True  (blue)  and  estimated  (red)  trajectory  (a),  altitude  (b,  upper),  and  average 
terrain  height  (b,  lower)  using  realistic  synthetic  imagery  and  GPS/INS  data  with  zero- 
mean  Gaussian  noise  with  3.0  meters  standard  deviation  added  to  T*  and  a  quaternion  with 
mean  [1,  0,  0,  0]7  and  standard  deviation  [0.01,  0.01,  0.01,  0.01]7  multiplied  onto  Ri. 


and  average  terrain  height.  Note  that  the  estimated  trajectory  drifts  from  the  true  trajectory, 
but  this  is  expected  because  of  the  high  weight  put  on  the  feature  equations  compared  to 
the  polynomial  (linear  in  this  case)  fit  in  GPS/INS  position  due  to  the  large  amount  of  noise 
in  the  GPS/INS  measurements.  The  altitude  and  average  terrain  height  remain  within  100 
meters  of  their  true  values.  The  algorithm  tends  to  maintain  the  altitude  at  a  constant  height 
above  the  terrain,  evidenced  by  the  the  two  estimated  altitudes  being  close  to  parallel.  As 
desired,  the  estimates  are  locally  consistent  even  though  they  drift  over  time. 


5.5  Conclusion 

We  have  motivated,  described,  and  analyzed  an  analytical  algorithm  for  determining  the 
location  and  orientation  of  a  camera  using  images  and  noisy  GPS/INS  location  data  that 
runs  together  with  RMFPP  in  real  time.  We  have  shown  that  the  trajectory  that  it  estimates 
on  noisy  synthetic  data  drifts  but  is  locally  consistent,  although  its  accuracy  is  not  sufficient 
for  the  Recursive  Multi-Frame  Planar  Parallax  algorithm  in  practice. 
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Chapter  6 


Experimental  Platform 


The  content  of  Sections  6.2-63  is  based  on  unpublished  work  coauthored  with  David  Shim, 
Christopher  Geyer,  and  Shankar  Sastry  [2 ].  This  material  is  used  with  the  permission  of 
the  coauthors. 


6.1  Introduction 

Although  the  final  platform  target  for  the  aerial  mapping  landing  system  was  the  Boeing 
Maverick,  a  modified  Robinson  R22  (a  full-sized  2-passenger  helicopter),  with  vehicle 
communication  using  the  Open  Control  Platform  (OCP),  the  Boeing  system  was  largely  a 
black  box.  In  this  chapter  we  discuss  the  control  and  architecture  of  the  Berkeley  surrogate 
platform,  as  well  as  the  sensor  calibration  that  was  essential  to  operation  on  both  platforms. 

6.2  Vehicle  Setup 

The  testbed  used  in  this  research  is  based  on  an  electrically  powered  RC  helicopter,  whose 
detailed  specifications  are  given  in  the  Table  6.1.  The  DC  brushless  motor  with  high- 
capacity  Lithium-polymer  batteries  allows  more  than  10  minutes  of  continuous  flight  with 
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Base  platform 

Electric  Helicopter  (Maxi-Joker  2) 

Dimensions 

0.26  m  (W)  x  2.2  m  (L)  x  0.41  m  (H) 

Rotor  Diameter 

1.8  m 

Weight 

4.5  kg  (no  onboard  electronics) 

7.5  kg  (fully  instrumented) 

Powerplant 

Actro  32-4  motor  (1740W  max  at  75A) 
Lithium-Ion-Polymer  (10S4P;  40V  8Ah) 

Operation  Time 

15  minutes 

Avionics 

Navigation:  DGPS-aided  INS 

GPS:  NovAtel  OEM4-MillenRT2 

IMU:  Inertial  Science  ISIS-IMU 

Flight  Computer:  PC  104  Pentium  TIT 
700MHz 

Communication:  IEEE  802. 1  lb  with 

RS-232  MUX 

Vision  Computer:  PC  104  Pentium  M 

1.6GHz 

Autonomy 

Waypoint  navigation  with  automatic  VTOL 
Position-tracking  servo  mode 

MPC-enabled  dynamic  path  planning  with 
collision  avoidance 

Stability-augmentation  system 

Table  6.1:  Specification  of  the  UAV  testbed.  Courtesy  of  David  Shim. 


the  ease  of  fully  automatic  start-stop  operation.  The  onboard  components  are  designed 
and  integrated  with  emphasis  on  weight  reduction  for  longer  flight  time,  reliability,  and 
maneuverability.  The  vehicle  is  controlled  by  a  PC  104  form  factor  Pentium  III  700MHz 
CPU  with  a  custom  servo  interfacing  board,  an  inertial  measurement  unit  (IMU),  a  high- 
precision  carrier-phase  differential  global  positioning  system,  and  an  IEEE  802.1  lb  device 
(see  Figure  6.1).  The  flight  control  system  communicates  with  the  ground  station  over  the 
wireless  channel  for  receiving  commands  and  sending  the  navigation  status  and  system 
vital  signs  such  as  the  battery  level  and  the  health  of  onboard  components. 

The  vision-based  terrain  mapping  and  analysis  system  is  implemented  on  a  PC  104  form 
factor  Pentium  M  computer  running  Linux.  The  CPU  is  interfaced  with  a  2GB  Compact 
Flash  drive,  a  Firewire  board,  and  Personal  Computer  Memory  Card  International  Asso¬ 
ciation  (PCMCIA)  wireless  Ethernet.  As  shown  in  Figure  1.1,  the  vision  computer  is  in- 
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Figure  6.1:  System  architecture  of  the  UAV  testbed  for  vision-based  landing  and  mapping. 
Courtesy  of  David  Shim. 
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stalled  in  the  nose  of  the  vehicle  in  a  dedicated  aluminum  enclosure  for  modularity  and 
EMI  shielding.  A  Firewire  camera  is  installed  in  a  forward-  and  downward-looking  di¬ 
rection  to  capture  the  ground  ahead  of  and  below  the  vehicle.  The  vision  system  receives 
vehicle  state  data  from,  and  sends  trajectory  commands  to,  the  flight  computer  through  the 
RS-232  serial  interface. 

As  described  in  Section  2.6,  the  vision  system  requests  appropriate  flight  patterns  as 
it  performs  real-time  terrain  mapping  and  analysis  at  varying  resolutions.  The  flight  con¬ 
trol  system  is  responsible  for  guiding  the  vehicle  through  the  requested  waypoints  with 
an  acceptable  accuracy.  At  a  constant  rate  of  10Hz,  the  flight  control  system  reports  to 
the  vision  system  the  time-stamped  navigation  status  such  as  position  and  attitude/heading, 
which  are  necessary  to  reconstruct  the  terrain  with  respect  to  an  inertial  reference  frame 
(see  Figure  6.1). 

6.3  Vehicle  Control 

The  flight  control  system  consists  of  two  hierarchical  layers:  the  trajectory  generator  and 
the  tracking  controller.  The  trajectory  generation  is  based  on  model  predictive  control 
(MPC)  [37],  which  is  solved  in  real  time  to  find  a  sequence  of  control  inputs  that  minimizes 
an  appropriate  cost  function. 

6.3.1  MPC-based  Trajectory  Generation 

In  [37,  38,  39],  it  was  shown  that  model  predictive  control  using  penalty  functions  for  state 
constraints  and  explicit  input  saturation  [40]  is  a  viable  approach  to  address  the  guidance 
and  control  problems  of  UAVs  at  a  reasonable  computation  load  for  real-time  operation. 
In  [38],  an  MPC-based  control  system  was  shown  to  have  outstanding  tracking  control  in 
the  presence  of  coupled  dynamic  modes  and  substantial  model  mismatch.  It  has  also  been 
demonstrated  that  MPC-based  optimization  could  be  formulated  to  implement  a  higher 
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level  of  autonomy,  such  as  real-time  aerial  collision  avoidance  [37],  and  obstacle  avoidance 
in  urban  environment  using  onboard  laser  scanner  [39].  In  [41],  an  MPC-based  trajectory 
planner  is  implemented  as  the  feedforward  control  part  of  a  two-degree-of-freedom  con¬ 
trol  system.  Here,  as  in  [41],  we  will  use  a  full  vehicle  kinodynamic  model  with  input 
saturation. 

Suppose  we  are  given  a  time  invariant  nonlinear  dynamic  system 

x(k  +  1)  =  f(x(k) ,  u(k)) ,  (6.1) 

where  x  6  X  C  Mn;l:  and  u  e  HJ  C  M""" .  The  optimal  control  sequence  over  the  finite 
receding  horizon  N  is  found  by  solving  the  nonlinear  programming  problem 

k+N- 1 

V(x,k,u)  =  L(x(i),  u(i)  +  F(x(k  +  N)),  (6.2) 

i=k 

where  L  is  a  positive  definite  cost  function  term  and  F  is  the  terminal  cost.  When  applied 
to  the  vision-based  landing  problem,  L  contains  a  term  that  penalizes  the  deviation  from 
the  desired  trajectory.  Suppose  u*(x.  k )  is  the  optimal  control  sequence  that  minimizes 
V (x,  k,  u )  such  that  V*(x,  k)  =  V (x,  k,  u*(x,  k)),  where  V*(x,  k)  <  V (x,  k,  it),  \fu  G 
U.  With  u*(k),  one  can  find  x*(k),  k  =  i,  ■  ■  ■  ,i  +  N  —  1  by  solving  recursively  the 
given  nonlinear  dynamics  with  x(i)  =  x0  (i)  as  the  initial  condition.  We  propose  to  use  the 
control  law 

u(k)  =  u*(k)  +  K(x*(k )  —  x(k)).  (6.3) 

For  the  feedback  control  law  K ,  a  control  strategy  similar  to  that  in  [42]  is  implemented. 
If  the  dynamic  model  used  for  solving  the  optimization  problem  perfectly  matches  the 
actual  dynamics  and  the  initial  condition  without  any  disturbance,  there  should  not  be  any 
tracking  error.  In  the  real  world,  such  an  assumption  cannot  be  satisfied.  In  this  setup, 
with  a  tracking  feedback  controller  in  the  feedback  loop,  the  system  can  track  the  given 
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Figure  6.2:  Flight  control  system  architecture  with  MPC-based  trajectory  generator.  Cour¬ 
tesy  of  David  Shim. 

trajectory  reliably  in  the  presence  of  disturbance  or  modeling  error.  The  architecture  of  the 
proposed  flight  control  system  is  given  in  Figure  6.2. 


6.3.2  Flight  Control  Strategy  for  Autonomous  Vision-based  Landing 

For  automatic  surveying,  the  control  system  should  be  able  to  guide  the  vehicle  through  the 
requested  waypoints  with  minimal  deviation  while  keeping  the  vehicle  well  within  its  dy¬ 
namic  performance  boundary.  The  flight  control  should  not  induce  any  excessive  vibration 
or  rapid  turns  that  may  cause  motion  blur  in  the  camera  image. 

The  vision  system  requests  one  of  three  types  of  flight  patterns  as  described  in  Figure 
2.4.  During  the  box  search  for  coarse  DEM  construction,  the  vision  system  sends  waypoints 
that  are  the  vertices  of  piecewise  linear  segments.  The  vehicle  is  expected  to  fly  along 
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the  linear  segments  with  minimal  deviation  while  making  bank-to-turn  maneuvers  around 
each  vertex  at  a  constant  horizontal  cruise  speed  and  altitude.  During  the  spiral  descent, 
the  vision  unit  sends  a  series  of  waypoints  with  much  finer  resolution.  In  final  descent 
mode,  the  vision  unit  sends  the  landing  coordinates  to  the  flight  control  system,  which  is 
solely  responsible  for  the  stability  and  tracking  accuracy  of  the  descent  maneuver  until 
touchdown.  Since  helicopters  go  through  substantial  perturbation  in  the  dynamics  due  to 
the  ground  effect,  wind  gusts,  and  reaction  force  from  the  ground,  we  believe  it  is  best  to 
leave  the  landing  task  to  the  flight  control  system,  which  is  fine-tuned  for  the  host  vehicle. 

In  order  to  maintain  a  given  trajectory  under  environmental  disturbance,  as  opposed  to 
heading  straight  toward  a  specific  waypoint  regardless  of  the  current  position,  one  has  to 
know  the  previous  waypoint1.  Additionally,  in  order  to  achieve  a  smooth  transition  between 
waypoint  requests  with  a  constant  cruise  speed,  one  has  to  know  the  next  waypoint  a  priori 
while  the  vehicle  is  approaching  the  current  waypoint  so  that  the  flight  control  system  can 
prepare  for  the  bank-to-turn  maneuver  without  any  abrupt  changes  of  heading  or  cruise 
velocity.  The  planning  around  a  waypoint  from  the  current  location  into  a  limited  portion 
of  future  time  perfectly  fits  the  fundamental  idea  of  MPC’s  receding  horizon  framework.  To 
allow  the  vehicle  flight  mode  described  above,  we  use  the  following  data  structure,  which 
specifies  flight  mode,  past/current/future  waypoints  (see  Figure  6.3),  horizontal/vertical 
speed  limit,  heading  rate  limit,  and  a  time  tag,  for  communication  from  the  vision  computer 
to  the  flight  computer  over  a  wired  RS-232  channel. 

typedef  struct  { 

double  GPSTime;  //  time  tag  [sec] 

//  latitude,  longitude,  height 
double  PastWP[3];  //  (rad, rad, m) 
double  CurrentWP [ 3 ] ;  //  (rad, rad, m) 
double  NextWP[3];  //  (rad, rad, m) 

'in  our  implementation,  this  point  is  not  required  to  have  been  the  actual  previous  waypoint;  it  is  merely 
a  point  that  defi  nes  a  vector  of  approach  to  the  current  waypoint. 
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Figure  6.3:  Three-point  waypoint  specification  and  MPC -based  trajectory  generation. 
Courtesy  of  David  Shim. 
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//  cruise  speed  past  WP  to  current  WP 
float  Vcruisel;  //  [m/s] 

//  cruise  speed  current  WP  to  next  WP 
float  Vcruise2;  //  [m/s] 

//  reference  vertical  velocity 
float  Vvert;  //  [m/s] 

//  reference  yaw  rate 
float  yaw_rate;  //  [rad/s] 

//  ABORT (0),  BOX_SEARCH ( 1 ) , 

//  SPIRAL_SEARCH (2) ,  LANDING ( 3 ).. . 

WORD  FlightMode; 

WORD  DataChecksum; 

}  MESSAGEBODY,  *ptMESSAGEBODY; 

6.3.3  Results 

In  order  to  build  the  terrain  map  or  investigate  a  candidate  landing  site,  the  vision  computer 
commands  the  vehicle  to  perform  a  box  search  or  closer  inspection  spiral  by  requesting 
waypoints  following  the  format  defined  above.  The  actual  flight  trajectory  and  other  nav¬ 
igation  states  are  presented  in  Figure  6.4.  The  vehicle  initially  follows  the  waypoints  in  a 
box  pattern,  and  then,  upon  discovering  a  possible  landing  zone,  the  vehicle  is  commanded 
to  fly  in  a  spiral  pattern  at  a  very  low  velocity.  After  completing  the  closer  inspection,  the 
vision  computer  commands  the  vehicle  to  abort  the  spiral  and  resume  the  box  search  at 
normal  flight  speed.  As  shown  in  the  figure,  the  helicopter  follows  this  flight  sequence  with 
acceptable  accuracy. 
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Figure  6.4:  Experimental  trajectory-following  result.  Courtesy  of  David  Shim. 
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6.4  Extrinsic  Calibration 


In  order  to  simplify  the  motion-stamping  problem,  we  assume  that  the  GPS/INS  unit  is 
colocated  with  the  camera  in  position  and  orientation,  or,  more  realistically,  that  its  mea¬ 
surements  can  be  corrected  so  that  they  are  in  the  reference  frame  of  the  camera.  Determi¬ 
nation  of  the  constant  coordinate  transformation  between  the  camera  and  a  GPS/INS  unit 
requires  data  from  both  sensors  during  a  broad  range  of  positions  and  orientations,  which 
is  possible  with  the  small  Berkeley  platform  by  maneuvering  it  manually  (see  Figure  6.5) 
but  which  is  not  possible  with  the  full-sized  Boeing  platform.  Calibration  of  the  camera 
with  the  GPS/INS  unit  on  the  Boeing  platform  requires  the  composition  of  two  coordinate 
transformations:  the  transformation  between  the  vehicle  GPS/INS  unit  and  an  additional 
GPS/INS  unit  on  the  (removable)  camera  platform,  and  the  transformation  between  the  ad¬ 
ditional  GPS/INS  unit  and  the  camera.  Since  the  additional  GPS/INS  unit  is  on  the  camera 
platform,  the  transformation  between  this  sensor  and  the  camera  can  be  estimated  from  data 
collected  by  manual  manipulation  of  the  camera  platform  alone.  The  additional  calibration 
between  the  two  GPS/INS  units  is  required  to  allow  the  system  to  use  the  vehicle  GPS/INS 
unit,  which  is  located  near  the  vehicle’s  center  of  gravity  and  is  of  higher  quality  than  the 
one  on  the  camera  platform. 

6.4.1  GPS/INS  to  GPS/INS 

In  this  section  we  assume  that  the  two  GPS/INS  devices  give  measurements  (G,  Ri,Ti) 
and  (Z2.  R2 •  T2),  where  tt  is  the  time  of  the  measurement,  Rj  is  a  3x3  rotation  matrix  cor¬ 
responding  to  the  orientation  of  the  sensor,  and  T)  is  a  3x1  vector  corresponding  to  the 
location  of  the  sensor.  Rj  and  T)  are  defined  such  that  pw  =  R\Pi  +  X)  where  pi  is  a  point 
in  the  reference  frame  of  the  ?th  GPS/INS  device  and  pw  are  the  coordinates  of  the  same 
point  in  the  world  reference  frame.  We  want  to  find  ( R.  T)  such  that  pi  =  Rp2  +  T. 
Because  the  location  measurements  from  the  GPS/INS  devices  are  very  noisy  and  are  gen- 
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Figure  6.5:  Camera  to  GPS/INS  calibration.  Courtesy  of  David  Shim. 
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erally  inconsistent  with  each  other  (particularly  in  the  vertical  direction),  and  because  we 
can  find  R  without  using  Ti  and  T2,  we  directly  measure  T. 

If  we  had  pairs  of  measurements  Rx  and  R2  at  the  same  times  t  —  t\  =  t2,  we  would 
have  direct  measurements  of  R  given  by  Rx7  R2.  In  practice,  we  use  pairs  of  measurements 
such  that  | ti  —  t2\  <5.  For  each  such  pair  we  form  a  linear  equation  of  the  form 

Ri  0  0 

0  Ri  0 
0  0  Ri 

where  is  the  i,j  element  of  R  and  is  the  i,j  element  of  R2.  Stacking  many  such 
equations,  we  have  Ax  =  b.  We  would  like  to  find  x  that  minimizes  |  \Ax  —  b 1 1|  subject  to 
det  R  =  1  (R  is  a  rotation  matrix).  If  we  relax  the  problem  by  eliminating  the  requirement 
that  R  be  a  rotation  matrix  then  the  unconstrained  optimization  problem  is  a  least  squares 
problem,  which  is  easily  solved  by  x  =  A^b  where  is  the  pseudoinverse  of  A.  If  R  is  the 
solution  found  by  least  squares,  the  closest  orthogonal  matrix  (by  Frobenius  norm)  is  given 
by 

R  =  r(rtr)”\  (6.5) 

the  polar  factor  of  the  polar  decomposition  of  R  [32], [33]. 

6.4.2  Camera  to  GPS/INS 

We  assume  that  the  GPS/INS  device  gives  measurements  (fx,  Rx,Tj)  as  above  and  that 
the  camera  gives  measurements  (t2,  fi,  x\,  yi,  ■  ■  ■ ,  xn,  yn ),  where  f,  indicates  whether 
corner  i  on  the  chess  board  is  visible  in  the  image  and,  if  it  is  visible,  xi}  yi  specifies  its 
location  in  calibrated  image  coordinates.  Ri  and  T)  are  defined  as  above.  Once  again,  we 
want  to  find  (R,  T)  such  that  p1  =  R p2  +  T.  Because  the  location  measurements  from  the 
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GPS/INS  device  are  very  noisy  (particularly  in  the  vertical  direction),  and  because  we  can 
find  R  without  using  Ti  (or  T2  for  that  matter),  we  directly  measure  T. 

Again  we  find  pairs  of  measurements  such  that  \ti  —  t2\  <  5.  We  then  take  random 
pairs  of  these  pairs,  i.e.  the  pair  consisting  of  [(t1,  Ri,  Tx) ,  (t2,  f1,x1,y1, . . . ,  fn,  xn,  yn)} 
and  [{t[,  Rj.  Tj')  ,  (t'2,  f[,  x\ ,  y\ , . . . ,  /(f,  a:X)  ,  y'j] ,  such  that  the  camera  measurements  have 

r  i  t 

at  least  10  chess  board  corners  in  common  (]U  fif[  >  10).  Let  Xi  =  Xi  yi  1  and 

r  i  t 

x'i  =  x\  ]/■  1  .  Using  the  point  correspondences  {a^i,  =  l},  we  compute  a 

homography  H  using  the  normalized  Direct  Linear  Transform  (DLT)  algorithm  [34].  If  all 
of  the  corresponding  world  points  lie  exactly  on  a  plane,  Xx^  =  H.x7;  for  some  A  Vi  such 
that  fif'  =  1  (H  is  only  defined  up  to  scale).  Therefore  xX  x  H.t7:  =  0.  Arranging  into  a 
standard  linear  form,  we  get 

0T  -xj  y'xf 

rp  T  r\T  _  l  T 

*L"l  yJ  <X/  rj  fAs  2 


hn 

h12 

=  0  (6.6) 

h 33 


where  htJ  is  the  i.  j  element  of  H  and  we  have  omitted  the  third  row  because  it  is  linearly 
dependent  on  the  first  two.  Since  the  corner  detection  is  not  exact  (it  is  discretized  to  the 
nearest  pixel),  we  want  to  find  H  that  minimizes  ||Ah|||  subject  to  ||/i||2  =  1,  where  A h 
is  at  least  10  of  the  above  equations  stacked  together  (theoretically  only  8  correspondences 
are  required  to  estimate  a  homography,  but  in  practice  better  results  are  achieved  with  more 
than  8  points).  The  solution  to  this  optimization  problem  is  given  by  the  least  singular 
vector  of  A.  This  is  the  unnormalized  DLT  algorithm.  The  normalized  DLT  algorithm  finds 
similarity  transforms  T  and  T',  each  consisting  of  a  translation  and  a  scaling  (both  in  the 

r  i  t 

xy  plane),  such  that  {Ta^|/j/'  =  1}  and  |T'aV| fif-  =  l}  both  have  mean  0  0  1 
and  average  distance  y/2  from  the  origin  in  the  xy  plane.  It  then  computes  H  using  the 
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unnormalized  DLT  algorithm  on  the  normalized  data;  the  final  result  is  H  =  T/_1HT. 

Since  the  scene  is  planar,  the  relative  rotation  between  the  two  cameras  in  the  pair 
is  given,  up  to  scale,  by  H  [34].  Therefore  we  find  using  Equation  6.5,  using  as 

input  H  normalized  so  that  it  has  positive  determinant  and  one  column  of  unit  norm.  We 
find  Rj7  Rx  directly  from  the  GPS/INS  data.  Since  (^2  ^2  j  =  R7  (^Rj7  Ri  j  R,  for  each 
1,  Ri,  Ti) ,  (t2,  fuXi,  2/1, ,  fn,  Xn,  yn) ,  (t[,  Ri,  T')  ,  (t'2,  f[,  x\ ,  y\ , . . . ,  ,  x'n,  y'n 

we  can  write 


RfRr 

0 

0 


0  0 
RjTRi  0 
0  r;tRi 


g11!  g21l  g31l 
g12l  g22l  g32l 
g13l  g23l  g33l 


\ 


„ii 


,21 


„33 


=  0  (6.7) 


where  rtJ  is  the  i ,  j  element  of  R  and  qlJ  is  the  i,j  element  of  R^7  R2.  Stacking  many 
such  equations  and  minimizing  gives  the  optimization  problem  minimize  |  |Ax|  ||  subject  to 
|  j  cc  1 1 2  =  1  and  det  R  =  1,  which  we  approximate  by  dropping  the  second  equality  con¬ 
straint,  finding  the  least  singular  vector  of  A,  and  using  Equation  6.5  with  input  normalized 
to  have  positive  determinant  and  one  column  of  unit  norm. 


6.5  Conclusion 

In  this  chapter,  we  have  provided  details  of  the  Berkeley  UAV  testbed,  as  well  as  the  ex¬ 
trinsic  calibration  procedure  used  for  both  the  Boeing  and  Berkeley  vehicles.  Experimental 
mapping  results  from  both  vehicles  will  given  in  Chapter  7. 
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Chapter  7 


Results 


The  content  of  this  chapter  is  based  on  published  work  coauthored  with  Christopher  Geyer, 
Marci  Meingast,  and  Shankar  Sastry  [1  ],  and  on  unpublished  work  coauthored  with  David 
Shim,  Christopher  Geyer,  and  Shankar  Sastry  [2].  This  material  is  used  with  the  permission 
of  the  coauthors. 

7.1  Introduction 

In  this  chapter,  we  present  results  of  running  the  mapping  system  on  synthetic  sequences, 
as  well  as  on  real  sequences  from  both  the  Boeing  and  the  Berkeley  platforms.  Since  the 
accuracy  of  the  online  motion  stamping  algorithms  such  as  those  discussed  in  Chapters  4 
and  5  have  thus  far  proven  to  be  insufficient  for  our  needs,  for  the  purposes  of  the  results 
from  real  data  in  this  report  we  separate  the  experiment  into  three  phases  to  showcase 
the  performance  of  the  other  components:  In  the  first  phase,  the  vision  high-level  planner 
directed  the  helicopter  to  perform  the  box  search  while  it  collected  image  data.  In  the 
second  phase,  we  performed  the  offline  motion  stamping  algorithm  described  in  Section 
2.3,  followed  by  RMFPP  running  on  similar  hardware  to  the  helicopter  vision  computer 
in  better  than  real  time.  In  the  third  phase,  we  executed  the  closer  inspection  and  landing 
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Figure  7.1:  Sample  synthetic  image  rendered  from  satellite  imagery  and  Digital  Terrain 
Elevation  Data  (DTED)  elevation  data. 


maneuver  using  a  premade  elevation  and  appearance  map. 


7.2  Results  From  Synthetic  Data 


© 2006  IEEE.  Reprinted,  with  permission,  from  [1  ]. 
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7.3  Results  From  Experimental  Data 


Figure  7.5  shows  a  similar  terrain  reconstruction  result  using  experimental  aerial  im¬ 
agery  from  the  Berkeley  vehicle  at  Richmond  Field  Station.  Although  much  of  the  left  side 
of  the  view  is  flat  and  uneventful  (as  it  should  be),  the  trees  in  the  middle  of  the  view  and 
the  road  on  the  right  are  clearly  visible. 

7.4  Conclusion 

This  chapter  has  given  mapping  results  using  synthetic  imagery,  as  well  as  using  experi¬ 
mental  imagery  captured  from  both  the  Boeing  and  Berkeley  platforms.  Although  we  were 
unable  to  accurately  motion-stamp  the  images  in  real  time,  and  hence  were  unable  to  build 
and  use  the  elevation  and  appearance  map  in  real  time,  we  have  shown  the  abilities  of 
crucial  pieces  of  the  mapping  and  landing  system. 


l©2006  IEEE.  Reprinted,  with  permission,  from  [1  ]. 
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Figure  7.5:  Berkeley  real  images  experiment:  The  reconstructed  appearance  draped  over 
the  reconstructed  elevation.  Reproduced,  with  permission,  from  [2], 
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Chapter  8 


Conclusion 


In  this  report,  we  have  presented  work  toward  a  vision-based  landing  system  for  unmanned 
rotorcraft  in  unknown  terrain  using  our  new  high-accuracy  Recursive  Multi-Frame  Planar 
Parallax  algorithm  [1]  for  terrain  mapping.  Although  we  have  not  yet  succeeded  in  finding 
a  motion-stamping  algorithm  that  performs  to  the  needs  of  our  next-generation  approach  in 
the  adverse  circumstances  of  high-altitude  flight,  we  are  confident  that  such  an  algorithm 
exists,  and  in  the  meantime  we  have  given  an  in-depth  description  of  the  vision  system, 
an  overview  of  our  experimental  platforms,  and  both  synthetic  and  experimental  accurate 
real-time  terrain  mapping  results. 
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